National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
Extension of Apache Tika with Industrial File Formats Text Extraction
Rešetár, René ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The goal of the bachelor's thesis was to extend the parsers of the Apache Tika project with data and table extraction from industrial document formats from laboratory instruments. These data will be stored in a structured format according to a certain scheme. In the theoretical part, the supplied industrial formats, the Apache Tika project and the possibilities of its expansion were examined. In the practical part, a tool was designed and implemented, which classifies documents using the Apache Tika project, processes them, creates structured data from them in the JSON format and subsequently validates them. Finally, a set of tests was created to verify and demonstrate the properties of the solution.
Extension of Apache Tika with Industrial File Formats Text Extraction
Rešetár, René ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The goal of the bachelor's thesis was to extend the parsers of the Apache Tika project with data and table extraction from industrial document formats from laboratory instruments. These data will be stored in a structured format according to a certain scheme. In the theoretical part, the supplied industrial formats, the Apache Tika project and the possibilities of its expansion were examined. In the practical part, a tool was designed and implemented, which classifies documents using the Apache Tika project, processes them, creates structured data from them in the JSON format and subsequently validates them. Finally, a set of tests was created to verify and demonstrate the properties of the solution.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.